Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R

نویسندگان

  • Peter Langfelder
  • Bin Zhang
  • Steve Horvath
چکیده

SUMMARY Hierarchical clustering is a widely used method for detecting clusters in genomic data. Clusters are defined by cutting branches off the dendrogram. A common but inflexible method uses a constant height cutoff value; this method exhibits suboptimal performance on complicated dendrograms. We present the Dynamic Tree Cut R package that implements novel dynamic branch cutting methods for detecting clusters in a dendrogram depending on their shape. Compared to the constant height cutoff method, our techniques offer the following advantages: (1) they are capable of identifying nested clusters; (2) they are flexible-cluster shape parameters can be tuned to suit the application at hand; (3) they are suitable for automation; and (4) they can optionally combine the advantages of hierarchical clustering and partitioning around medoids, giving better detection of outliers. We illustrate the use of these methods by applying them to protein-protein interaction network data and to a simulated gene expression data set. AVAILABILITY The Dynamic Tree Cut method is implemented in an R package available at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Tree Cut: in-depth description, tests and applications

In hierarchical clustering, clusters are defined as branches of a cluster tree. The constant height branch cut, a commonly used method to identify branches of a cluster tree, is not ideal for cluster identification in complicated dendrograms. We describe a new dynamic branch cutting approach for detecting clusters in a cluster tree based on their shape. Compared to the constant height cutoff, o...

متن کامل

HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree

Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed-height cut on the HC tree is chosen, and each contiguous branch of data points below that height...

متن کامل

Vignette for HCsnip: An R Package for semi-supervised adaptive-height snipping of the Hierarchical Clustering tree

This vignette shows the use of HCsnip package for extracting clusters from the Hierarchical Clustering (HC) tree in semi-supervised way. Rather than cutting the HC tree at a fixed highest (as existing methods do), it snips the tree at variable heights to extract hidden clusters. Cluster extraction process uses both the data matrix from which HC tree is derived and the available follow-up inform...

متن کامل

Parallel and Hierarchical Mode Association Clustering with an R Package Modalclust

Modalclust is an R package which performs Hierarchical Mode Association Clustering (HMAC) along with its parallel implementation (PHMAC) over several processors. Modal clustering techniques are especially designed to efficiently extract clusters in high dimensions with arbitrary density shapes. Further, clustering is performed over several resolutions and the results are summarized as a hierarc...

متن کامل

Hierarchical tree snipping: clustering guided by prior knowledge

MOTIVATION Hierarchical clustering is widely used to cluster genes into groups based on their expression similarity. This method first constructs a tree. Next this tree is partitioned into subtrees by cutting all edges at some level, thereby inducing a clustering. Unfortunately, the resulting clusters often do not exhibit significant functional coherence. RESULTS To improve the biological sig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 24 5  شماره 

صفحات  -

تاریخ انتشار 2008